Probability Redistribution using Time Hopping for Reinforcement Learning

نویسندگان

Petar S. Kormushev

Fangyan Dong

Kaoru Hirota

چکیده

A method for using the Time Hopping technique as a tool for probability redistribution is proposed. Applied to reinforcement learning in a simulation, it is able to re-shape the state probability distribution of the underlying Markov decision process as desired. This is achieved by modifying the target selection strategy of Time Hopping appropriately. Experiments with a robot maze reinforcement learning problem show that the method improves the exploration efficiency by re-shaping the state probability distribution to an almost uniform distribution.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Eligibility Propagation to Speed up Time Hopping for RL Eligibility Propagation to Speed up Time Hopping for Reinforcement Learning

A mechanism called Eligibility Propagation is proposed to speed up the Time Hopping technique used for faster Reinforcement Learning in simulations. Eligibility Propagation provides for Time Hopping similar abilities to what eligibility traces provide for conventional Reinforcement Learning. It propagates values from one state to all of its temporal predecessors using a state transitions graph....

متن کامل

Time Hopping technique for faster reinforcement learning in simulations

A technique called Time Hopping is proposed for speeding up reinforcement learning algorithms. It is applicable to continuous optimization problems running in computer simulations. Making shortcuts in time by hopping between distant states combined with off-policy reinforcement learning allows the technique to maintain higher learning rate. Experiments on a simulated biped crawling robot confir...

متن کامل

Eligibility Propagation to Speed up Time Hopping for Reinforcement Learning

متن کامل

Reinforcement Learning for Reactive Jamming Mitigation

In this paper, we propose a strategy to avoid or mitigate reactive forms of jamming using a reinforcement learning approach. The mitigation strategy focuses on finding an effective channel hopping and idling pattern to maximize link throughput. Thus, the strategy is well-suited for frequency-hopping spread spectrum systems, and best performs in tandem with a channel selection algorithm. By usin...

متن کامل

Time Hopping Technique for Reinforcement Learning and its Application to Robot Control

To speed up the convergence of reinforcement learning (RL) algorithms by more efficient use of computer simulations, three algorithmic techniques are proposed: Time Manipulation, Time Hopping, and Eligibility Propagation. They are evaluated on various robot control tasks. The proposed Time Manipulation [1] is a concept of manipulating the time inside a simulation and using it as a tool to speed...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Probability Redistribution using Time Hopping for Reinforcement Learning

نویسندگان

چکیده

منابع مشابه

Eligibility Propagation to Speed up Time Hopping for RL Eligibility Propagation to Speed up Time Hopping for Reinforcement Learning

Time Hopping technique for faster reinforcement learning in simulations

Eligibility Propagation to Speed up Time Hopping for Reinforcement Learning

Reinforcement Learning for Reactive Jamming Mitigation

Time Hopping Technique for Reinforcement Learning and its Application to Robot Control

عنوان ژورنال:

اشتراک گذاری